QL 2 , a simple reinforcement learning scheme for two - player zero - sum
نویسنده
چکیده
Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-94), 1994.) uses this framework to model two-agent zero-sum problems and, within this context, proposes the minimax-Q algorithm. This paper reviews RL algorithms for two-player zero-sum Markov games and introduces a new, simple, fast, algorithm, called QL2. QL2 is compared to several standard algorithms (Q-learning, Minimax and minimax-Q) implemented with the Qash library written in Python. The experiments show that QL2 converges empirically to optimal mixed policies, as minimax-Q , but uses a surprisingly simple and cheap updating rule. & 2009 Elsevier B.V. All rights reserved.
منابع مشابه
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games
Markov games are a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm in this framework, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q...
متن کاملConvergent Multiple-times-scales Reinforcement Learning Algorithms in Normal Form Games
We consider reinforcement learning algorithms in normal form games. Using two-time-scales stochastic approximation, we introduce a modelfree algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surel...
متن کاملConvergent Multiple-timescales Reinforcement Learning Algorithms in Normal Form Games
We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a modelfree algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely...
متن کاملLearning in Markov Games with Incomplete Information
The Markov game (also called stochastic game (Filar & Vrieze 1997)) has been adopted as a theoretical framework for multiagent reinforcement learning (Littman 1994). In a Markov game, there are n agents, each facing a Markov decision process (MDP). All agents’ MDPs are correlated through their reward functions and the state transition function. As Markov decision process provides a theoretical ...
متن کاملReinforcement learning with restrictions on the action set
Consider a 2-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function, and have no information on the other player. Furthermore, we assume that they have restrictions on their own actions such that, at each stage, their choice is lim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009